This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Load the libraries
library(tidyverse)## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.1 ✔ dplyr 0.8.0.1
## ✔ tidyr 0.8.3 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.5.2
## Warning: package 'tibble' was built under R version 3.5.2
## Warning: package 'tidyr' was built under R version 3.5.2
## Warning: package 'purrr' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.2
## Warning: package 'stringr' was built under R version 3.5.2
## Warning: package 'forcats' was built under R version 3.5.2
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(RCurl)## Warning: package 'RCurl' was built under R version 3.5.2
## Loading required package: bitops
##
## Attaching package: 'RCurl'
## The following object is masked from 'package:tidyr':
##
## complete
library(dplyr)
library(readr)
library(scales)##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(ggthemes)## Warning: package 'ggthemes' was built under R version 3.5.2
Load the data
test3 <- getURL("https://raw.githubusercontent.com/fivethirtyeight/data/master/marriage/divorce.csv")
divorcedata <- read.csv(text = test3)This was an attempt at finding a code to make the y scale a percent. The code tested was (This code is not in a code chunk because the file will not knit wiht in incorrect code): scale_y_continous(labels=scales::percent)
This code chunk was on trying to find a way to convert data to percents
help(percent)
HS_percent <- percent(divorcedata$HS_3544)
divorcedata$HS_percent = divorcedata$HS_3544*(100)The follwing are attempts at converting data to percents. They did not list how they come up with “completed college” variable, so i had to try various different ways and then graph them to see if i got the same line that they did.
divorcedata$HS_percent = divorcedata$HS_3544*(100)
divorcedata$HSP = divorcedata$HS_3544*(100)
divorcedata$CSTotal = divorcedata$BAp_3544 + divorcedata$BAo_3544 + divorcedata$GD_3544
divorcedata$CSTotal2 = divorcedata$BAp_3544 + divorcedata$BAo_3544
divorcedata$CSP2 = divorcedata$CSTotal2*(100)
divorcedata$CSP = divorcedata$CSTotal*(100)
divorcedata$SCP = divorcedata$SC_3544*(100)I am attempting at plotting just one line with a percent. Failed due to scale_y_ continous: HS <- ggplot(divorcedata) + geom_line(mapping = aes(x = year, y = HS_3544)) HS + scale_y_continous(labels=scales::percent)
This is an attempt at plotting one line
HS <- ggplot(divorcedata) +
geom_line(mapping = aes(x = year, y = HS_3544))Keeping code that i wanted to try to use to plot the percent, and the limit i wanted to set it too
scale_y_continuous(name = "percent", limits = c(0, 25))## <ScaleContinuousPosition>
## Range:
## Limits: 0 -- 25
Here I tested out the scaling code with the plotting code. It was not working.
ggplot(divorcedata) +
geom_line(mapping = aes(x = year, y = HS_3544)) + scale_y_continuous(name = "percent", limits = c(0, 25))This is when i tried to make the graph of divorce rates by education
ggplot(divorcedata) +
geom_line(mapping = aes(x = year, y = HS_3544))I had to do further testing of various percents to match theirs. I wish more infomration had been given on how they went from teh dataset to the variables in the graph.
divorcedata$HS_percent = divorcedata$HS_3544*(100)
divorcedata$HSP = divorcedata$HS_3544*(100)
divorcedata$CSTotal = divorcedata$BAp_3544 + divorcedata$BAo_3544 + divorcedata$GD_3544
divorcedata$CSTotal2 = divorcedata$BAp_3544 + divorcedata$BAo_3544
divorcedata$CSP2 = divorcedata$CSTotal2*(100)
divorcedata$CSP = divorcedata$CSTotal*(100)
divorcedata$SCP = divorcedata$SC_3544*(100)Succesful maping with one percent. now time to add in more lines
ggplot(divorcedata) +
geom_line(mapping = aes(x = year, y = HSP)) +
scale_y_continuous(name = "percent", limits = c(0, 25))Figuring out how to get colors to match the best
ggplot(divorcedata) +
geom_line(mapping = aes(x = year, y = HSP, color = "HSP")) +
geom_line(mapping = aes(x = year, y = SCP, color = "Some College")) +
geom_line(mapping = aes(x = year, y = CSP2, color = "all college")) +
scale_y_continuous(name = "percent", limits = c(0, 25)) The final percents for the other two variables
divorcedata$HSP = divorcedata$HS_3544*(100)
divorcedata$SCP = divorcedata$SC_3544*(100)
divorcedata$BAP = divorcedata$BAp_3544*(100)Trying to get labels
ggplot(divorcedata) +
geom_line(mapping = aes(x = year, y = HSP), color = 'blue') +
geom_line(mapping = aes(x = year, y = SCP), color = 'light blue') +
geom_line(mapping = aes(x = year, y = BAP), color = 'purple') +
scale_y_continuous(name = "percent", limits = c(0, 25)) +
ggtitle("Divorce Rates by Education: Ages 35-44") +
scale_color_discrete(name = "education level", breaks=c("HSP", "SCP", "BAP"), labels=c("High School", "Some College", "Bachelors")) +
theme(legend.position = "right")More attempts at labeling on graph (not included in a code chunk because file will not knit) ggplot(divorcedata) + geom_line(mapping = aes(x = year, y = HSP), color = ‘blue’) + geom_line(mapping = aes(x = year, y = SCP), color = ‘light blue’) + geom_line(mapping = aes(x = year, y = BAP), color = ‘purple’) + scale_y_continuous(name = “percent”, limits = c(0, 25)) + ggtitle(“Divorce Rates by Education: Ages 35-44”) + scale_color_discrete(name = “education level”) + geom_dl(aes(label = Education), method = list(dl.combine(“first.points”, “last.points”), cex = 0.8)
still working, but can’t get the labels exaclty how I would like them
ggplot(divorcedata) +
geom_line(mapping = aes(x = year, y = HSP, label = "High School"), color = 'blue') +
geom_line(mapping = aes(x = year, y = SCP), color = 'light blue') +
geom_line(mapping = aes(x = year, y = BAP), color = 'purple') +
scale_y_continuous(name = "percent", limits = c(0, 25)) +
ggtitle("Divorce Rates by Education: Ages 35-44") ## Warning: Ignoring unknown aesthetics: label
AND FINALLY ALL THE PIECES TOGETHER. The code below produced my most successful figure! I also attempted to match the colors to the one online, which only used shades of blue.
ggplot(divorcedata) +
geom_line(mapping = aes(x = year, y = HSP), color = 'cyan') +
geom_line(mapping = aes(x = year, y = SCP), color = 'navy blue') +
geom_line(mapping = aes(x = year, y = BAP), color = 'light blue') +
scale_y_continuous(name = "percent", limits = c(0, 25)) +
labs(title = "Divorce Rates by Education", subtitle = "Ages 35-44", x = "years") +
annotate("text", x= 1990, y = 23, label = "Some college") +
annotate("text", x= 2000, y = 13, label = "High school or less") +
annotate("text", x= 2005, y = 5, label = "College Graduates") +
theme_fivethirtyeight() +
geom_segment(aes(2000, 6, xend = 2000, yend =10.1)) +
geom_segment(aes(2000, 15, xend = 2000, yend = 17)) +
geom_segment(aes(1990, 17, xend = 1990, yend = 21))The original plots and article can be found at: https://fivethirtyeight.com/features/marriage-isnt-dead-yet/